Earth Data Science Coding Challenge!¶
Before we get started, make sure to read or review the guidelines below. These will help make sure that your code is readable and reproducible.
Don't get caught by these Jupyter notebook gotchas¶

Image source: https://alaskausfws.medium.com/whats-big-and-brown-and-loves-salmon-e1803579ee36
These are the most common issues that will keep you from getting started and delay your code review:
- When you try to run some code on GitHub Codespaces, you may be prompted to select a kernel.
- The kernel refers to the version of Python you are using
- You should use the base kernel, which should be the default option.
- You can also use the
Select Kernelmenu in the upper right to select the base kernel
- Before you commit your work, make sure it runs reproducibly by clicking:
Restart(this button won't appear until you've run some code), thenRun All
Check your code to make sure it's clean and easy to read¶
- Format all cells prior to submitting (right click on your code).
- Use expressive names for variables so you or the reader knows what they are.
- Use comments to explain your code -- e.g.
# This is a comment, it starts with a hash sign
Label and describe your plots¶

Make sure each plot has:
- A title that explains where and when the data are from
- x- and y- axis labels with units where appropriate
- A legend where appropriate
Icons: how to use this notebook¶
We use the following icons to let you know when you need to change something to complete the challenge:
💻 means you need to write or edit some code.
📖 indicates recommended reading
✎ marks written responses to questions
🌶 is an optional extra challenge
Habitat modeling under climate change¶
Create a habitat suitability model for Sorghastrum nutans, a grass native to North America. In the past 50 years, its range has moved northward. The model will be based on combining multiple data layers related to soil, topography, and climate. You will also demonstrate the coding skills covered in this class by creating a modular, reproducible, object-oriented workflow for the model.
STEP 1: STUDY OVERVIEW¶
Your notebook should include an explanation of why and how you create you workflow. INCLUDE a sentance or two about the function/method and class you created, and why you chose it.
Classes and objects in Python¶
You can think of an object as a collection of named values and functions that make use of those values. When they are part of an object, we call them attributes and methods. Each object is created from a class, or object template. Examples of classes you have used include the pandas DataFrame and xarray DataArray. As an example, xr.DataArray includes attributes like .values, and methods like .mean(). It also contains some special attributes created when the function is initialized for each of the coordinates (e.g. x and y if your DataArray has those as coordinates).
You can absolutely write a reproducible scientific workflow without writing any of your own classes. However, there are some situations where custom classes can come in handy. Consider the following examples:
- Suppose you are writing code to interface with an API. There are going to be multiple steps, such as logging in, searching for data, and downloading data. Throughout the whole interaction, you want to keep track of the dates and bounding box for the data -- and you might even want to access those values after the data are completely downloaded! If you write a function, you'll run into two problems: 1) the function will be very long, 2) the function may take a long time to run, making it challenging to debug and text, and 3) you'll lose all the named values you define inside the function when it finishes running unless you pass them explicitely as return values. Using a class solves all these problems. You can split up the different parts of the workflow into shorter methods that all have access to the same information, such as bounding box or HTTP Session information.
- Suppose you are assembling a data cube (aligned and harmonized data from heterogeneous sources). Eventually, you will have a single
xarray DataArrayorDataset. However, in the meantime you need to keep track of many data files such that you can access them by name, by attribute, or by order. Often we use thepandas DataFramefor this type of thing. However, it would be nice if theDataFramedid a couple of things differently. For example, when it prints you would like it to display the bounding box of each DataArray instead of the array itself (which is computationally expensive). Or, maybe when you create the DataFrame you would like it to search for all file names that match a pattern and extract key information. You can accomplish these goals by creating a child class topandas DataFramethat inherits all its capabilities and modifies its methods, perhaps adding some new ones.
As you revisit your habitat modeling workflow, think carefully about at least one class you would like to modify or create.
Using climate model data¶
You will use MACAv2 data for historical and future climate. Be sure to compare at least two emissions scenarios, as well as historic or current climatology as a control. When downloading climate data, bear in mind that:
- In general, we are interested in climate normals, which are typically calculated from 30 years of data so that they reflect the climate and not a single year. So if you are interested in scenarios around 2050, download at least data from 2035-2065.
- There is a great deal of uncertainty in climate models. We deal with that by using an ensemble of models to try to capture that uncertainty. For each scenario, you should attempt to include models that are:
- Warm and wet
- Warm and dry
- Cold and wet
- Cold and dry
You will create a reproducible scientific workflow¶
Your workflow should:
- Define your study area: Download the USFS National Grassland Units and select your study sites.
- Fit a model: For each grassland:
- Download model variables as raster layers covering your study area envelope, including:
- At least one soil variable from the POLARIS dataset
- Elevation from the SRTM (available from the APPEEARS API)
- The 30 year periods of precipitation and temperature from the MACAv2 dataset, accessible from Climate Toolbox. Include an arrangement of emissions scenarios and time periods, following the guidelines above.
- Calculate at least one derived **topographic variable** (slope or aspect) to use in your model. You probably will wish to use the
xarray-spatiallibrary, which is available in the latest earth-analytics-python environment (but will need to be installed/updated if you are working on your own machine). Note that calculated slope may not be correct if you are using a CRS with units of degrees; you should re-project into a projected coordinate system with units of meters, such as the appropriate UTM Zone. - Harmonize your data - make sure that the grids for each of your layers match up. Check out the
ds.rio.reproject_match()method fromrioxarray. - Build your model. You can use any model you wish, so long as you explain your choice. However, if you are not sure what to do, we recommend building a fuzzy logic model (see below).
- Download model variables as raster layers covering your study area envelope, including:
- Present your results in at least one figure for each grassland/climate scenario combination.
# Import packages used for the analysis
import os
from glob import glob
import math
import warnings
import cftime
import earthpy as et
import earthpy.appeears as eaapp
import geopandas as gpd
import geoviews as gv
import holoviews as hv
import hvplot.pandas
import hvplot.xarray
import pandas as pd
import requests
import matplotlib.pyplot as plt
import numpy as np
import rioxarray as rxr
import rioxarray.merge as rxrmerge
import skfuzzy as fuzz
import skfuzzy.control as ctrl
import xarray as xr
import xrspatial
from urllib.parse import urlparse
# I chose to reproject my grasslands to utm zone 13 and 14
# One of the grassland is in the middle of zone 14, while
# the other is in 13.
utm_epsg_13 = 32613
utm_epsg_14 = 32614
# Create data directories
data_dir = os.path.join(et.io.HOME, et.io.DATA_NAME)
# Ignore RuntimeWarning
warnings.filterwarnings("ignore", category=RuntimeWarning)
class Grassland:
def __init__(self, name, utm):
self.name = name
self.utm = utm
self.grassgdf = None
def load_grass_boundaries(self):
grasslands_url = (
"https://data.fs.usda.gov/geodata/edw/edw_resources/shp"
"/S_USA.NationalGrassland.zip")
grasslands_gdf = (
gpd.read_file(grasslands_url)
.set_index('GRASSLANDN')
)
self.grassgdf = (
grasslands_gdf.loc[[self.name]]
# .to_crs(self.utm)
)
self.polygon = self.grassgdf.to_crs(self.utm)
def get_site_map(self):
site = (gv.tile_sources.EsriNatGeo *
gv.Polygons(self.grassgdf)
.opts(fill_alpha=0,
line_color='green',
line_width=5)).opts(
width=800,
height=600,
title=self.name
)
return site
def load_site_ph(self):
polaris_das = []
bounds = self.grassgdf.total_bounds
min_lon, min_lat, max_lon, max_lat = bounds
for lon in range(math.floor(min_lon), math.ceil(max_lon)):
for lat in range(math.floor(min_lat), math.ceil(max_lat)):
max_lat, max_lon = lat+1, lon+1
# print(lat, lon, max_lat, max_lon)
polaris_url_format = (
"http://hydrology.cee.duke.edu/POLARIS/PROPERTIES/v1.0"
"/ph/mean/60_100/lat{min_lat}{max_lat}_lon{min_lon}{max_lon}.tif"
)
polaris_url = polaris_url_format.format(
min_lat=lat, min_lon=lon, max_lat=max_lat, max_lon=max_lon)
polaris_das.append(
rxr.open_rasterio(polaris_url, masked=True).squeeze())
# print(polaris_das[-2])
self.ph_da = (
rxrmerge.merge_arrays(polaris_das)
.rio.reproject(self.utm)
.rio.write_crs(self.utm)
.rio.clip_box(*self.polygon.total_bounds)
)
def download_srtm_data(self):
"""
Download SRTM raster for a given geometry, start date, and end date
Downloads data from NASA Shuttle Radar Topography Mission (SRTM)
given a spatial and temporal extent. NASA Shuttle Radar
Topography Mission Global 1 arc second launched
February 11, 2000 and flew for 11 days.
<https://lpdaac.usgs.gov/products/srtmgl1v003/>
Parameters
==========
name : str
The name used to label the download
grasspolygon : geopandas.GeoDataFrame
The geometry to derive the download extent from.
Must have a `.envelope` attribute.
Returns
=======
downloader : earthpy.earthexplorer.EarthExplorerDownloader
Object with information about the download, including the data directory.
"""
print(f'\nGRASSLANDN: {self.name}')
srtm_downloader = eaapp.AppeearsDownloader(
download_key = self.name.lower().replace(' ', '-'),
product='SRTMGL1_NC.003',
layer='SRTMGL1_DEM',
start_date='02-11-2000',
end_date='02-21-2000',
polygon=self.grassgdf,
ea_dir=os.path.join(et.io.HOME, 'earth-analytics')
)
if not os.path.exists(srtm_downloader.data_dir):
srtm_downloader.submit_task_request()
print("SRTM requested and downloading...")
srtm_downloader.download_files()
self.srtm_path=(glob(
os.path.join(
srtm_downloader.data_dir,
'SRTMGL1_NC.003*',
'*.tif')
))
else:
self.srtm_path=(glob(
os.path.join(
srtm_downloader.data_dir,
'SRTMGL1_NC.003*',
'*.tif')
))
print("SRTM data already downloaded.")
def load_srtm(self):
"""
Load in and merge downloaded srtm
Parameters
==========
name : str
The name used to label the download
srtm_paths : dict
path to srtm files
Returns
=======
srtm_gdf: rxr.DataArray
DataArray with the srtm data
"""
print(f'\nGRASSLANDN: {self.name}')
self.grass_srtm_da = (
rxr.open_rasterio(self.srtm_path[0], masked = True)
.squeeze()
).rio.reproject_match(self.ph_da)
print(f"Loaded SRTM data for {self.name}")
return self.grass_srtm_da
def calc_aspect(self):
"""
Calculate aspect from srtm data
Parameters
==========
srtm_da : xr.DataArray
DataArray with the srtm data
Returns
=======
aspect_da: xr.DataArray
DataArray with the aspect data
"""
self.aspect_da = xrspatial.aspect(self.grass_srtm_da)
print(f"Calculated aspect for {self.name}")
return self.aspect_da
oglala = Grassland("Oglala National Grassland", utm_epsg_13)
fort_pierre = Grassland("Fort Pierre National Grassland", utm_epsg_14)
oglala.load_grass_boundaries()
fort_pierre.load_grass_boundaries()
fort_pierre.grassgdf
fort_pierre.get_site_map()
oglala.get_site_map()
oglala.grassgdf.total_bounds
oglala.load_site_ph()
oglala.download_srtm_data()
oglala.load_srtm()
oglala.calc_aspect()
fort_pierre.load_site_ph()
fort_pierre.download_srtm_data()
fort_pierre.load_srtm()
fort_pierre.calc_aspect()
GRASSLANDN: Oglala National Grassland SRTM data already downloaded. GRASSLANDN: Oglala National Grassland Loaded SRTM data for Oglala National Grassland Calculated aspect for Oglala National Grassland GRASSLANDN: Fort Pierre National Grassland SRTM data already downloaded. GRASSLANDN: Fort Pierre National Grassland Loaded SRTM data for Fort Pierre National Grassland Calculated aspect for Fort Pierre National Grassland
<xarray.DataArray 'aspect' (y: 1160, x: 1121)>
array([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
Coordinates:
band int32 1
spatial_ref int32 0
* x (x) float64 3.819e+05 3.819e+05 ... 4.148e+05 4.148e+05
* y (y) float64 4.904e+06 4.904e+06 4.904e+06 ... 4.87e+06 4.87e+06
Attributes:
add_offset: 0.0
AREA_OR_POINT: Area
scale_factor: 1.0
units: Metersfort_pierre.aspect_da.hvplot(x='x', y='y',
colormap="colorwheel",
title="Aspect at Oglala National Grassland")
oglala.ph_da.hvplot() * oglala.polygon.hvplot(color=None, line_color='white')